Knowledge Distillation with Feature Self Attention

نویسندگان

چکیده

With the rapid development of deep learning technology, size and performance network continuously grow, making compression essential for commercial applications. In this paper, we propose a Feature Self Attention (FSA) module that extracts correlation information between hidden features new method distilling to compress model. FSA does not require special or match teacher model student By removing multi-head structure repeated self-attention blocks in existing mechanism, it minimizes addition parameters. Based on ResNet-18, 34, added parameters are only 2.00M training speed is also fastest comparison benchmark models. It was demonstrated through experiments use interrelationship loss can be beneficial models, indicating importance considering neural compression. And verified from scratch vanilla without pre-trained weight

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

The relationship between Mindfulness and Attention with Academic Self-efficacy

The purpose of this research was to study the relationship between Mindfulness and Attention with Academic Self-efficacy in high school students In the 95-94 academic year in the Lordegan city. This study was Correlation- descriptive. The statistical population has consisted of all high school students, That the number of 2814 people been. Three hundred high school students from Lordegan were c...

متن کامل

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real-time applications. We study the studentteacher strategy, in which a small and fast student network is trained with the auxiliary information provided by a large and accurate teacher network. We use conditional adversarial networks to learn the loss function to transfer knowledge from teacher to student. The proposed method...

متن کامل

Learning Efficient Object Detection Models with Knowledge Distillation

Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3265382